Predicting the size and probability of epidemics in a population with heterogeneous 

infectiousness and susceptibility 



Joel C. Miller 

Mathematical Modeling & Analysis Group and Center for Nonlinear Studies, MS B284, Los Alamos National Laboratory 

(Dated: February 7, 2008) 

We analytically address disease outbreaks in large, random networks with heterogeneous infectiv- 
ity and susceptibility. The transmissibility T uv (the probability that infection of u causes infection 
of v) depends on the infectivity of u and the susceptibility of v. Initially a single node is infected, 
following which a large-scale epidemic may or may not occur. We use a generating function ap- 
proach to study how heterogeneity affects the probability that an epidemic occurs and, if one occurs, 
its attack rate (the fraction infected). For fixed average transmissibility, we find upper and lower 
bounds on these. An epidemic is most likely if infectivity is homogeneous and least likely if the 
variance of infectivity is maximized. Similarly, the attack rate is largest if susceptibility is homoge- 
neous and smallest if the variance is maximized. We further show that heterogeneity in infectious 
period is important, contrary to assumptions of previous studies. We confirm our theoretical pre- 
dictions by simulation. Our results have implications for control strategy design and identification 
of populations at higher risk from an epidemic. 



The spread of infectious disease is a problem of great 
interest Much work has focused on how diseases 

spread in networks of human, animal, or computer in- 
teractions @, [3, i, i, @, 0, 0, @|. The transmissibility, 
the probability that an edge transmits infection, has a 
network-dependent threshold (which can be zero) corre- 
sponding to a second order phase transistion above which 
an epidemic may happen and below which epidemics are 
not found. Ideally an intervention reduces the transmis- 
sibility or modifies the network to raise the threshold so 
that epidemics cannot occur. Most study has focused on 
determiningthe threshold value under varying assump- 
tions @, 0,11, El 1I| in order to design an optimal 
intervention. 

For many diseases and networks, it is impractical to 
reduce transmissibility sufficiently to eliminate the pos- 
sibility of an epidemic. An intervention strategy should 
therefore optimize competing goals: minimize social cost, 
reduce the probability a large-scale epidemic occurs, and 
reduce the attack rate (fraction infected) if an epidemic 
does occur. Recently theprobability and attack rate have 
been investigated @, 0, H , Q3, , but none of these has 
systematically investigated the effect of heterogeneity in 
transmissibilities. Heterogeneities can result from varia- 
tions in the application of interventions or from natural 
differences in the population such as variation in recovery 
time. It is often assumed that this special case can be 
mapped without loss of generality to recovery of all indi- 
viduals after a single time step [J, @, 0, d, 0, [H| and so 
the number of new cases from a single case is distributed 
binomially. However, it may be inferred from (l5j that 
this assumption is false. We have recently become aware 
of independent work [lj| which shows that recovery time 
heterogeneity reduces the epidemic probability, but has 
no effect on the attack rate. In this Letter, we consider 
how generic heterogeneities affect the epidemic probabil- 
ity and the attack rate if an epidemic occurs. 



The epidemics we study spread on random networks of 
N nodes with degree distribution given by P(k) where k 
is the degree. We use the SIR model nodes are di- 
vided into susceptible, infectious, and recovered classes. 
We modify the model to include heterogeneities. An in- 
fectious node u with infectivity X u connected to a suscep- 
tible node v with susceptibility S v infects v with prob- 
ability equal to the transmissibility T MV (X Ul S v ) of the 
edge. Infectious nodes recover and are no longer suscep- 
tible. The outbreak begins with a single infection (the 
index case) which spreads to neighboring nodes. If an 
epidemic occurs, the eventual number infected is O(N), 
otherwise the outbreak is localized. X and S can be quite 
arbitrary, e.g., X may be a vector representing time of in- 
fection, level of virus shedding, frequency of handwash- 
ing, etc. The form of T uv can also be quite general: it 
need only be intcgrablc and bounded in [0, 1]. 

The spread of an epidemic on a network with hetero- 
geneous infectivity and susceptibility is equivalent to a 
special case of directed percolation for which the prob- 
ability of retaining an edge depends on both the base 
and target node. In this formalism, infection spreads to 
the out-component of the index case 0, 0|. If the dis- 
ease has sufficiently high transmissibility a single giant 
strongly connected component G scc exists [l7j |. occupy- 
ing a fixed fraction of the network as A" — > oo. The set of 
nodes not in G scc , but from which G scc can be reached 
is denoted Gi, while the set of nodes not in G scc but 
reachable from G scc is denoted G [i for 'in' and o for 
'out'] as demonstrated in figure [T] If the index case is in 
Gi U G scc an epidemic occurs, infecting all of G U G scc 
and very few other nodes. In the limit N oo the prob- 
ability of an epidemic is the probability the index case is 
in Gi U G S cc and the attack rate is the fraction of nodes 
in G U Gscc [IU . We use V and A to denote the limit- 
ing epidemic probability and attack rate. In general the 
sizes of Gi and G Q may differ significantly so V / A. 
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FIG. 1: Schematic representation of G;, G scc , and G . All 
nodes in G scc can reach any other node in G scc - 

This contrasts with the case of homogeneous transmissi- 
bility where the problem can be mapped to undirected 
bond percolation @, [H, and V = A. 

We develop a general theory to find V allowing both in- 
fectivity and susceptibility to be heterogeneous. Generat- 
ing function approaches [20[ have been used to study dis- 



ease spread both inside the body [2l| or in society [3j, |J] . 
We modify these approaches to calculate V based on 
the distribution of X and S. Holding the average trans- 
missibility fixed, we then use Jensen's inequality to find 
distributions which give upper and lower bounds on V '. 
Because G and G, interchange roles if edge directions 
are reversed, A is calculated in the same manner. Our 
predictions are confirmed through simulations on a large 
Erdos-Rcnyi network. 

Each node u has an infectivity X u and a susceptibility 
S u chosen from independent distributions given by P(X U ) 
and P(S U ). Given the infectivity X u of u, the relation 
Tuv(X Ul S v ), and the distribution P(S), we define the out- 
transmissibility of u as 



T (u) 



J Tuv(X u , S V )P(S V ) dS v . 



(1) 



From (TT|) and the distribution P(X), we know the distri- 
bution P (T ). We similarly define the in-transmissibility 
Ti and its distribution Pj(Tj). P and Pi must yield the 
same average, but not all pairs P and Pi with the same 
average are consistent. For each P there exists at least 
one Pi and vice versa. Henceforth we consider just Pi 
and P , and do not use P(S) and P(X). 

We choose the index case uq uniformly from the pop- 
ulation. We classify infected cases by their generation, 
measuring the number of infectious contacts in the chain 
between them and uo (generation 0). We note that the 
generation time need not be fixed: generations may over- 
lap in time, changing the temporal dynamics but not af- 
fecting our results. 

Our class of random networks is defined by the Molloy- 
Reed algorithm [22| . Short cycles are rare. The neighbor- 
hood of uo is tree-like on successively longer length-scales 
as N — > oo. Consequently, V equals the probability that 
the transmission chains in an infinite tree are infinite. 

We define a probability generating function f(x) for 
the number of infected nodes in generation 1: 

f(x) = po +pix H lp 3 x 3 H , 



where pj is the probability that the index case directly 
infects j neighbors. The index case has degree k with 
probability P(k) and thus pj is given by 



Pj 



OO 

E 

k=j 



p{k) 



f Bi(k,j,T )P (T )dT , 
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where Bi(fc,j, T a ) is the likelihood of j successful trials 
from k attempts, each with probability T a . Note that pj 
depends on the distribution P Q but not Pj. 

In subsequent generations, the probability that a node 
is infected is proportional to its degree. Early in the epi- 
demic an infected node with degree k has k— 1 susceptible 
neighbors because the source of its infection cannot be re- 
infected. As such the probability qj that this individual 
infects j neighbors is 



Qj = 



1 oo -i 

— V kP(k) / Bi(k-l,j,T )P (T )dT . 



where (•} denotes the expected value. We let h(x) = 
^2 QjX 3 be the generating function for the number of new 
cases caused by a non-index case. The generating func- 
tion for the number of infections caused by n non-index 
cases is [h(x)] n . Consequently it may be shown that the 
generating function for the number of infections in gen- 
eration g > is given by 

where h 9 ~ x denotes composition of h with itself g — I 
times. For later use we rearrange / and h as 



„1 oo 

/ p (r )V[i + r (a:-i)] fc p(fc)dr , (2) 

Kx) = jf 1 ^) fy_ + To (x l)] fe - 1 fcP(fc) dT . (3) 

The extinction probability is limg^oo /(/i 9 ~ 1 (0)). To 
calculate this, we find limg^oo /i 9_1 (0) which is a solution 
to x = h{x). At most two solutions exist in the interval 
[0, 1], one of which is x = 1. If no other solution exists 
then x — 1 is a stable fixed point and V = 0. Otherwise 
the iteration converges to xq < 1 and 

V = \- f(x ) . 

Because / and h are independent of Pi, V is unaffected 
by heterogeneities in susceptibility. 

We now seek distributions P Q maximizing or minimiz- 
ing V subject to (T) = T* . In their investigation of 
recovery time heterogeneities, [3] showed that the prob- 
ability is maximized if recovery times are identical. We 
generalize this to arbitrary sources of heterogeneities in 



infectivity, using a similar proof. For notational conve- 
nience we use 6*(T) to denote the 5-function 5(T — T*), 
set 

1 oo 



k=l 



and rewrite ([3]) to explicitly show that h depends on P D 

h[P ](x) = f h(T ,x)P (T )dT . 
Jo 

We similarly define f[P ] (x). Because h is a convex func- 
tion of T, Jensen's inequality shows P = 5* minimizes 
h[P ](x). We denote the smallest root of x = h[5*](x) by 
x\. For x < x\ and any P Q , we have x < h[S*](x) < 
h[P ](x). Thus the root xo of x = h[P ](x) satisfies 
x i < so xo is minimized if P Q = 6* . 

Similar calculations show f [6*] (x) < f[P ](x) for all 
P D . Further, / [6*] (x) is an increasing function of x. 
Thus the extinction probability f[P ](xo) is minimized 
by P =6*. So homogeneous infectivity maximizes V. 

In addition, we find a new lower bound. Jensen's in- 
equality also implies that fixing (T) = T* but increas- 
ing (P 2 ) reduces V . Consequently, V is minimized by 
P (T ) = (1 - T*)6(T ) + T*S(T - 1). 

Thus we have shown that an epidemic is most likely 
if T Q is homogeneous and least likely if its variance is 
maximized. Analogously the attack rate is largest if Ti is 
homogeneous, and smallest if its variance is maximized. 

We expect that a threshold value of (P) exists above 
which epidemics can occur [x = h{x) has two roots] and 
below which they cannot. Allowing (T) to vary by con- 
tinuously changing P a , the fixed point x — 1 of x — h(x) 
bifurcates into two when h'(l) = 1. We find 



h'(l) 



(T){k 2 -k) 
(k) 



So the epidemic threshold is (P) = (k) / (k 2 — k), gener- 
alizing the results of 0, S| • 

We confirm our predictions by comparison with sim- 
ulations on an Erdos-Renyi network with 100000 nodes 
and (k) = 4. As N — ► oo, Erdos-Renyi networks with 
fixed average degree have Poisson degree distribution and 



h(x) = f(x) 



cxp[(k)T (x-l)}P {T )dT . 



For our first comparison, we consider the effect of vary- 
ing infection time. We discretize time, and take differ- 
ent models of recovery time given in the caption of fig- 
ure [2j For each time step, the probability of infecting 
a susceptible neighbor is p, and so an infected individ- 
ual with recovery time r has T Q = 1 — (1 — p) T . As a 
reference we consider the case where all infectious indi- 
viduals recover after exactly five time steps. We vary p 




FIG. 2: Comparison of theory (lines) with simulation (sym- 
bols). For the different distributions of infectivity (with sus- 
ceptibility constant), V changes, but A does not. We use 
constant recovery time r = 5 (A), r = or oo (0), t = 2 or 8 
(□), r = 1 or 10 (o), and finally a constant recovery rate (x). 




FIG. 3: Comparison of theory (curves) with simulation (sym- 
bols) for T uv — 1 — exp(— aI u S v ). The theoretical bounds are 
in dashed bold. The distributions are 0: P(%) = 5(1 — 1), 
P(S) = 0.5<5(S-0.001)+0.5<5(«S-1); x: P{1) = 0.55(1-0.3)+ 
0.5S(X - 1), P(S) = 0.2(5(5 - 0.1) + 0.85(5 - 1); o; P(l) = 
0.5J(T-0.1)+0.5<J(I-1), P(S) = 0.25(5-0.1)+0.8(5(5-l); 
□ : P(J) = 0.35(1 - 0.001) + 0.75(2: - 1), P(5) = 5(5 - 1). 



in order to change the average transmissiblity T*. The 
fraction of nodes with each recovery time is chosen such 
that £P(r)[l - (1 - P y\ = l-(l-p) 5 = T*. 

The results of several examples are shown in figure [2] 
Each data point represents 10000 simulations. Away 
from the epidemic threshold, there is a clear distinction 
between an epidemic and a non-epidemic outbreak. For 
definitcness, we define an epidemic to occur if over 500 
nodes arc infected. Theory and simulations are in good 
agreement. The upper bound for epidemic probability is 
realized by the case where all infections last exactly five 
time steps. The lower bound is realized by the case where 
some infections last forever and infect all neighbors, while 
the rest recover before infecting anyone. Because suscep- 
tibility is homogeneous, A does not vary. 

For our second comparison we perform calculations for 
systems with both X and S heterogeneous. We use the 
same Erdos-Renyi network, but assume that T uv = 1 — 
exp[— aX u S v ] where the distributions of X and S arc fixed 
and a is varied to tune T*. We find good agreement 
between theory and simulations in figure O 

We have shown in this Letter that the effect of gen- 
eral heterogeneity in infectivity and susceptibility on epi- 
demic probability V and attack rate A can be accurately 
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modeled using a generating function approach. We find 
that V and A may differ substantially. We have further 
shown that heterogeneity in recovery time has a signifi- 
cant effect on V and cannot be ignored. 

For fixed average transmissibility we have found upper 
and lower bounds for both V and A. Further we have 
found distributions realizing these bounds. For fixed av- 
erage transmissibility, increasing the variance of P de- 
creases V and increasing the variance of Pj decreases A. 

These results can be used to assist in designing control 
strategies. For example, if choosing between a strategy 
that reduces infectivity or susceptibility by half for all 
of the population or one that reduces infectivity or sus- 
ceptibility completely for half the population, it is bet- 
ter to choose the latter. As another example, consider 
a strategy which attempts to locate and isolate infect- 
eds compared with a strategy which attempts to provide 
susceptible individuals with protection. Both may be 
affected by inability to reach everyone. The first strat- 
egy has a heterogeneous impact on infectivity, while the 
second strategy has a heterogeneous impact on suscepti- 
bility. If the strategies have the same average impact on 
T then the first reduces the probability of an epidemic 
more while the second reduces its size more. Which strat- 
egy is optimal depends on whether the outbreak is small 
enough that an epidemic can be prevented. 

Our results can also be used to identify populations 
most at risk from epidemics. Populations with low ge- 
netic diversity are already known to be at particularly 
high risk from an outbreak because the lack of hetero- 
geneity allows the transmissibility to be higher. However, 
our results show that even for a fixed average transmis- 
sibility, a population with lower genetic variation will be 
more severely affected by a disease. 

For heterogeneous infectivity but homogeneous suscep- 
tibility, Newman 0] anticipated that A follows from the 
formulae derived under the assumption of homogeneous 
T. He did not address the effect on V . We have shown 
that A is independent of heterogeneity in infectiousness, 
and so for this special case the prediction is valid. How- 
ever, it fails if susceptibility is also heterogeneous. 

The theory developed here can be generalized in a 
number of ways. Most simply, we can introduce edge 
weights to represent some details of the contact between 
u and v. The same theory will hold, but the calculation 
of T Q and Ti as in ([T]) must incorporate the edge weight 
distribution. We can also introduce correlations between 
the distributions of Z, S, and k in an individual without 
significant theoretical difficulties, though the conclusions 
may change. It is more complicated to introduce corre- 
lations of X, S, or k between neighbors. 

We have assumed that the network has few short cy- 
cles. More realistic social network models incorporate 
significant clustering. However, at high transmissibili- 
ties, if any neighbors are infected, an epidemic is very 
likely. V is close to the probability that the initial node 



infects any neighbors and loops may be ignored. At low 
transmissibilities loops are not traced out by the infection 
and again may be neglected. Loops affect our results only 
at intermediate transmissibilities. The generating func- 
tion approach becomes difficult because even early in an 
outbreak an infected node may have multiple infected 
neighbors. 
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